gdpR: An R Package for studying differentially private algorithms

This paper serves as a reference and introduction on using the \(gdpR\) R package. The goal of this package is to provide some tools for exploring the impact of different privacy regimes on a Bayesian analysis. A strength of this framework is the ability to target the exact posterior in settings where the likelihood is too complex to analytically express.

Jordan A. Awan https://www.britannica.com/animal/quokka (Purdue University) , Kevin Eng https://www.britannica.com/animal/quokka (Rutgers University) , Robin Gong https://www.britannica.com/animal/quokka (Rutgers University) , Nianqiao Phyllis Ju https://www.britannica.com/animal/quokka (Purdue University) , Vinayak A. Rao https://www.britannica.com/animal/quokka (Purdue University)
2022-11-09

1 Introduction

The ease and pervasiveness of modern data collection technologies has raised concerns about data privacy. (Dwork and Roth 2013) introduced the differential privacy framework as a means to rigorously define privacy. The framework has lead to the development of many ``privitized’’ versions of existing statistical methods. The process of privitizing usually consist of introducing random noise in someway using a known distribution.

2 overview of the gdpR packge

This section reviews This will show a verbatim inline R expression `r 1+1` in the output.

3 Background

Some packages on interactive graphics include plotly (Sievert 2020) that interfaces with Javascript for web-based interactive graphics, crosstalk (Cheng and Sievert 2021) that specializes cross-linking elements across individual graphics. The recent R Journal paper tsibbletalk (Wang and Cook 2021) provides a good example of including interactive graphics into an article for the journal. It has both a set of linked plots, and also an animated gif example, illustrating linking between time series plots and feature summaries.

4 Customizing tooltip design with ToOoOlTiPs

ToOoOlTiPs is a packages for customizing tooltips in interactive graphics, it features these possibilities.

The palmerpenguins data (Horst et al. 2020) features three penguin species which has a lovely illustration by Alison Horst in Figure 1.

A picture of three different penguins with their species: Chinstrap, Gentoo, and Adelie.

Figure 1: Artwork by @allison_horst

Table 1 prints at the first few rows of the penguins data:

Table 1: A basic table
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007

Figure 2 shows an interactive plot of the penguins data, made using the plotly package.

p <- penguins %>% 
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, 
             color = species)) + 
  geom_point()
ggplotly(p)

Figure 2: A basic interactive plot made with the plotly package on palmer penguin data. Three species of penguins are plotted with bill depth on the x-axis and bill length on the y-axis. When hovering on a point, a tooltip will show the exact value of the bill depth and length for that point, along with the species name.

6 Summary

We have displayed various tooltips that are available in the package ToOoOlTiPs.

6.1 CRAN packages used

plotly, crosstalk, tsibbletalk, palmerpenguins, ggplot2

6.2 CRAN Task Views implied by cited packages

Phylogenetics, Spatial, TeachingStatistics, TimeSeries, WebTechnologies

J. Cheng and C. Sievert. crosstalk: Inter-widget interactivity for HTML widgets. 2021. URL https://CRAN.R-project.org/package=crosstalk. R package version 1.1.1.
C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4): 211–407, 2013. URL https://doi.org/10.1561/0400000042.
A. M. Horst, A. P. Hill and K. B. Gorman. palmerpenguins: Palmer archipelago (antarctica) penguin data. 2020. URL https://allisonhorst.github.io/palmerpenguins/. R package version 0.1.0.
C. Sievert. Interactive Web-Based Data Visualization with r, plotly, and shiny. Chapman; Hall/CRC, 2020. URL https://plotly-r.com.
E. Wang and D. Cook. Conversations in time: Interactive visualisation to explore structured temporal data. The R Journal, 2021. URL https://journal.r-project.org/archive/2021/RJ-2021-050/index.html.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Awan, et al., "gdpR: An R Package for studying differentially private algorithms", The R Journal, 2022

BibTeX citation

@article{dppaper,
  author = {Awan, Jordan A. and Eng, Kevin and Gong, Robin and Ju, Nianqiao Phyllis and Rao, Vinayak A.},
  title = {gdpR: An R Package for studying differentially private algorithms},
  journal = {The R Journal},
  year = {2022},
  issn = {2073-4859},
  pages = {1}
}